Introduction
the ‘cause’ of language
- previously: Universal Grammar seen as a cause
↕
recent views
- some effects arise without particular causes, as the result of random interactions of large numbers of elements in complex systems
- ⇒ effects emerge
Computer simulations of language change notes
This website collects my personal notes on Computer simulations of language change. These notes are provided to bring full transparency to my research process. Of course, since they are only notes, they do not reflect my final thoughts on a topic, and should not be interpreted as such. To read finished papers, please consult my website. Do not use these notes as a basis for your own scientific research. Start from high-quality, peer-reviewed scientific literature instead.
Abstract
An understanding of language as a complex system helps us to think differently about linguistics, and helps us to address the impact of linguistic interaction. This book demonstrates how the science of complex systems changes every area of linguistics: how to make a grammar, how to think about the history of language, how language works in the brain, and how it works in social settings. Kretzschmar argues that to construct the best grammars of languages, it is necessary to understand the complex system of speech. Each chapter makes specific recommendations for how linguists should manage empirical data in order to form better generalizations about a language and its varieties. The book will be welcomed by students and scholars working in linguistics and English language, especially the study of language variation and the historical development of English.
the ‘cause’ of language
↕
recent views
“order for free” (Kaufman 1995: 83)
↓
speech / language in use
The most basic assumption of generative and structural linguistics, that we speakers all share the system of a language, share the rules for a language, is simply wrong. We all participate in speech, but the language is a little different, both in its available features and in the frequency with which we use those features, for each one of us individually and for each of us as a participant in every group to which we belong.
generalisations
Definition of a complex system (Mitchell 2009: 13)
a complex system is “a system in which large networks of components with no central control and simple rules of operation give rise to complex collective behavior, sophisticated information processing, and adaptation via learning or evolution.”
“complex”?
Complexity science at Santa Fe Institute (SFI)
doing complexity science required a commitment to observation, experience, and experiment that is in balance with the transdisciplinary, out-of-the-box curiosity that gave rise to the original question. It’s not that outcomes in complex adaptive systems are repeatable as they are in many scientific disciplines; complex systems are by definition unpredictable, and often downright squirrelly. But finding the patterns embedded in complex systems requires a distinct brand of scientific rigor and methodological approaches that in many cases haven’t yet been invented.
‘adversary’ of complexity science
way of working
While complexity science was taking off at SFI, it did receive some early allusive discussion in linguistics: Lindblom, MacNeilage, and Studdert-Kennedy published a 1984 paper on self-organizing processes in phonology; Paul Hopper presented his seminal paper called “Emergent Grammar” in Berkeley in 1987 (see Chapters 3 and 4); Ronald Langacker published a chapter titled “A Usage-Based Model” for cognitive linguistics in 1988. Gradually more papers attempting to use complex systems in linguistics appeared in the 1990s, such as Van Geert (1991). In 1996 Edgar Schneider presented a paper whose title was a question, “Chaos Theory as a Model for Dialect Variability and Change?” (published 1997). At that time, it had already been over twenty years since the original paper on climate by Edward Lorenz that asked the question, “Does the Flap of a Butterfly’s Wings in Brazil Set off a Tornado in Texas?” (1972), and over ten years since the founding of the SFI, where chaos theory was studied as part of the emerging field of complexity science. But it was very early for a student of language to consider the subject as a serious model for speech. In the same year, J. K. Chambers commented in a book review that “We will need a coterie of sociolinguists expert in chaos theory before we can make a start [at applications to our field]” (1996: 163). Chambers noted that the biggest problem for language applications then was that chaos theory seemed to require a long series of observations over time, a rare commodity for those who systematically record language in use.
ants
features of an ant colony
RIP John Conway 😢
Game of Life
initial state
cellular automaton
Basic principles of complex systems
1. continuing dynamic activity in the system
2. random interaction of large numbers of components
3. exchange of information with feedback
4. reinforcement of behaviors
5. emergence of stable patterns without central control
More chaos echoes
The mathematician Benoit Mandelbrot claimed that “many patterns of Nature are so irregular and fragmented, that, compared to [standard geometry] Nature exhibits not simply a higher degree but an altogether different level of complexity” (1982: 1). His treatment of natural forms like the geometry of coastlines presented problems that could not be solved with traditional methods but required a new nonlinear mathematics, what he called “fractals.” Fractals are familiar to many of us through repeating graphic designs like the “Koch islands” in Figure 1.3. The basic properties of fractals – including scaling properties, as illustrated here – characterize many objects of study in the physical, natural, and social sciences, not just graphic designs
Koch Island (p. 12)
| equilibrium system | non-equilibrium system |
|---|---|
| closed | open |
| do not exchange matter or energy outside the system | exchange energy of matter outside the system |
| components are balanced | components are continually negociating |
Equilibrium systems
Low-energy equilibrium system
Kauffman offers the example of dropping a ball down the side of a bowl: the ball will roll up and down the sides but will eventually come to rest at the bottom of the bowl, at low-energy equilibrium.
High-energy equilibrium system
In the case of an energetic equilibrium system, again in Kauffman’s example, if we put a quantity of gas molecules into a tank, the molecules do not stay ordered in a group at the point of entry; they keep moving around in the tank, and according to the ergodic theory they move randomly through all of the statistically possible states of arrangement.
Non-equilibrum system
Kauffman sets the counter example of the small whirlpool that forms near drains: this ordered structure will be maintained as long as the drain remains open and water continues to flow The order in such a non-equilibrium system is sustained by persistent dissipation of matter and energy, and thus the whirlpool can be called a “dissipative structure” of the kind described by Prigogine. No stirring is required to start the whirlpool, no single and simple cause. The water is subject to natural laws like gravity that makes the water drain, but no collection of laws completely explains the whirlpool because randomness in the molecules and conditions is involved in the emergence of every particular whirlpool.
chaotic systems
We still get a whirlpool or bubble patterns at different water levels, or if someone should wade in the water and disturb the flow.
Descriptions of self-organization typically use data collected in time series, and apply complex mathematical operations to generate “attractors,” as shown by Guastello and Liebovitch for psychology (Figure 1.9). As for the application of successive values to make Mandelbrot’s San Marco Dragon with a formula, successive measurements of real phenomena over time may create patterns when graphed that tend towards a fixed point (A), or create an oscillation or orbital shape (C, E), or other patterns whose regularity may be more difficult to see (chaotic, or “strange” attractors). Each successive moment in time corresponds to a “state” of the phenomenon being observed, whether it is traffic in a city or economic activity in a country or evolutionary development in a biological system.
| complex systems | chaotic systems |
|---|---|
| settle into a very small number of states | occupy a very large number of states |
| undisturbed by small changes | |
attractor
time series in language studies
We do look back in time in historical linguistics to observe change, but we do not have enough information about how the members of a population were actually speaking at any given time to make any more than speculative judgments about the state of the language in the remote past.
The situation is little better for the recent past, when we may have more information from writing or even from recorded speech, but still no fair way to estimate how all of the members of a population were actually speaking (see Chapter 7).4 This means that we need to look for the effects of complex systems in speech in the surveys and other collections of data that we can actually carry out.
4 The problem of rich data also introduces greater dimensionality for the description of systems. Stephen Wolfram (2002) required over 1,000 pages to prepare a comprehensive description of the patterns created by the successive states of a one-dimensional cellular automaton (a set of eight boxes in a row, which can be either black or white), according to the application of different rules for how the on/off pattern would change between states. Kauffman’s light bulbs were arranged in a two-dimensional grid. Speakers as agents interact with each other in many different ways, and speakers can choose between many possibilities for any linguistic feature we wish to observe.
(my emphasis, also for the italics)
↕
‘chaotic’ view of language
complex systems
1. continuing dynamic activity in the system
2. random interaction of large numbers of components
3. exchange of information with feedback
4. reinforcement of behaviors
Zipf’s Law
↓
| quantity | frequency |
|---|---|
| “few” | very frequent words |
| “some” | moderately frequent words |
| “most” | low frequency words |
| linear scales on both axes | logarithmic scales on both axes |
|---|---|
also called an A-curve
place of occurrence
We now have an answer for Edgar Schneider’s question about chaos theory and speech. No, chaos theory is not a model for dialect variability and change. But speech does constitute a complex system, “at the edge of chaos.”
We can use the A curve to define the relationship between what people actually say or write and the generalisations that we want to make from that behaviour
↓
most common variants on the A-curve
↓
observational artifact
long tail variants on the A-curve
↳ distribution itself
So, for example, a diphthongal pronunciation of fog is what we expect from women, but not what we expect from the LAMSAS speakers overall. We can expect ‘weeks without rain’ to be called a dry spell as a normal word, but accept that many other words are possible, and understandable, as different variants. We can perceive the top-ranked variants of any linguistic feature for groups at any level of scale, and the fact that different variants for a given feature will be ranked more highly in different groups helps us to distinguish the language behavior of the group.
↳ ‘scale-free’ for language
choice of form
↓
feedback and reinforcement
linguistic “system”
↳ frequency effects
Linguistic systems as low-energy equilibrium systems are always observational artifacts of our perception, in effect transformations of speech data from its natural existence as part of a complex system.
Analysing language in the traditional way?
When we propose the existence of such hierarchical linguistic systems based ([e.g. tree-based grammar]) upon our perception of speech around us, as we certainly want to do and are justified to do by the distributional patterns of speech as a complex system, we need to be guided by the nonlinear distributional pattern of the evidence of language in use because no system that we describe is actually instantiated in the spoken interactions themselves.
(own interpolation)
skipped
We will see that a complete single grammar for a language could never be motivated by detailed observation of speech production, because the patterned distribution of features and their variants that always emerges from the complex system of speech cannot be captured by the binary logic of formal linguistic systems.
(my own emphasis)
What is grammar?
[W]hile grammar never exists as such in language in use, it can well exist as a description of regularities indirectly derived from speech performance by perceptual means. This is, in fact, just what all linguists do
long tail of the A-curve
Elizabeth Traugott slander
Bybee (2001) focus
↳ phonology
networked units
In network models, internal structure is emergent – it is based on the network of connections built up among stored units. The stored units are pronounceable linguistic forms – words or phrases stored as clusters of surface variants organized into clusters of related words . . . Units such as syllables and segments emerge from the inherent nature of the organization of gestures for articulation.
(Bybee 2001: 85)
gradient storage
The view that phonological representations are self-organizing means that units of analysis, such as segments and syllables, are emergent units and are permitted to have gradient properties. This view does not insist upon one unit of uniform size for describing all speech, but rather proposes that the organization of linguistic material into units depends entirely upon the substantive properties of that material.
(Bybee 2001: 86)
⚠ ↓ problem!
Bybee’s a priori assumption of linguistic structure
As we have seen in Chapter 1, every individual is at the nexus of many groups of people, regional and social, and each one of those groups will have its own distributional pattern of many variants for any feature we name.
There is no necessity to privilege the units that linguists normally talk about, like phonemes, which derive essentially from the simplifying generalization of a formal model.
In her 2001 book Bybee tried to marry simple ideas of self-organization and emergence from complex systems with the assumption of structure, but she was doomed to fail, if influentially, by her superficial use of complexity and by her failure to break away from the assumption of structure.
Anthe summary
To recapitulate, the issue here is that Bybee assumes there is such a thing as ‘structure’, while structure doesn’t exist in reality according to Kretzschmar, and is actually derived from our perception of reality. At the same time, Bybee ignores the fact that an individual doesn’t store ‘a grammar’ in one’s mind, but rather a collection of different forms and structures in an A-curve-like manner, which can be employed depending on the situation (which outs itself, for example, as social variation).
examplar theory
Lots of important things going on here
The exact phonetic details of a word’s pronunciation arise because
A second challenge arises from the fact that differential phonetic outcomes relate specifically to word frequency. Standard generative models do not encode word frequency.
(Pierrehumbert 2001: 138)
in the exemplar model, “each category is represented in memory by a large cloud of remembered tokens of that category” (Pierrehumbert 2001: 139)
categorisation
↳ influencing factors
Schematic representation of categorisation structure
TODO this reminds me of the language acquisition paper from Speech Science. let’s look that up again later
The new token is symbolized by the asterisk at the bottom of the figure, and the task is to decide whether the token belongs to the category for the /ɛ/ phoneme or for the /ɪ/ phoneme along the one dimension presented here, F2. In this ambiguous case, the new token is closer to the main distribution of /ɪ/ remembered tokens, even though the actual distribution overlaps, and so it would be assigned to the /ɪ/ category.
multidimensionality
The exemplar approach associates with each category of a system a cloud of detailed perceptual memories. The memories are granularized as a function of the acuity of the perceptual system (and possibly as a function of additional factors). Frequency is not overtly encoded in the model. Instead, it is intrinsic to the cognitive representations for the categories. More frequent categories have more exemplars and more highly activated exemplars than less frequent categories (Pierrehumbert 2001 : 142).
Benefits of the exemplar model
Problems with Pierrehumbert model
1. it is too simple
2. the pattern is wrong
3. statistical learning in language acquisition
↳ role of frequency
4. phoneme system
5. prototype model
While Pierrehumbert’s exemplar model makes an improvement in the management of frequency information over Bybee’s earlier high-level generalizations, it does not yet get all the way to complex systems because it addresses neither the nonlinear distribution of realizations nor their scale-free distribution.
key point
usage-based idea
One particular verb accounts for the lion’s share of tokens of each argument frame considered in an extensive corpus study . . . The dominance of a single verb in the construction facilitates the association of the meaning of the verb in the construction with the construction itself, allowing learners to get a “fix” on the construction’s meaning. . . . In this way, grammatical constructions may arise developmentally as generalizations over lexical items in particular patterns.
↳ traditional linguistic practice
The good
1. generalisation
2. association of constructions with cognitive processing
3. free manner of categorisation
The bad
it just doesn’t go far enough
view on grammar
“A plausible way to think of mature linguistic competence, then, is as a structured inventory of constructions, some of which are similar to many others and so reside in a more core-like center, and others of which connect to very few other constructions (and in different ways) and so reside more towards the periphery.”
(2003: 5–6)
recurrence
(some language acquisition data to prove a point – nothing special)
A usage-based theory of grammar in which the cognitive organization of language is based directly on experience with language. Rather than being an abstract set of rules or structures that are only indirectly related to experience with language, we see grammar as a network built up from the categorized instances of language use . . . The basic units of grammar are constructions, which are direct form-meaning pairings that range from the very specific (words or idioms) to the more general (passive construction, ditransitive construction), and from very small units (words with affixes, walked) to clause-level or even discourse-level units . . . Because grammar is based on usage, it contains many details of co-occurrence as well as a record of the probabilities of occurrence and cooccurrence. The evidence for the impact of usage on cognitive organization includes the fact that language users are aware of specific instances of constructions that are conventionalized and the multiple ways in which frequency of use has an impact on structure.
usage-based linguistics ideas
their definition of grammaticalisation
Language as a [complex adaptive system] of dynamic usage and its experience involves the following key features:
(Ellis and Larsen-Freeman 2009a: 2)
↳ structures of language
language change
↳ problem
preference for, or “fixation” of, a grammatical state takes the dynamic movement of the complex system and freezes it, so that one variant of a feature becomes “grammatical” in the sense of having been selected
[F]requency distributions occur for any constructions we decide to nominate, but linguists are the ones who create the categories, who make the grammar. The operation of complex systems does not create a network out of which we observe a state as an object. The complex system merely creates a nonlinear frequency distribution, and linguists think that they see a grammar in the distribution after the fact.
Language as a complex adaptive system
When linguistic structure is viewed as emergent from the repeated application of underlying processes, rather than given a priori or by design, then language can be seen as a complex adaptive system . . . The primary reason for viewing language as a complex adaptive system, that is, as being more like sand dunes than like a planned structure, such as a building, is that language exhibits a great deal of variation and gradience.
(Bybee 2010 : 2)
↳ gradience
↳ variation
Constructions also have exemplar representations, but these will be more complex, because, depending upon how one defines them, most or all constructions are partially schematic – that is, they have positions that can be filled by a variety of words or phrases.
Problems
1. prototypes
2. grammaticalisation
3. traditional categories
4. grammaticalisation paths as strange attractors
Bybee does not yet deal extensively with scaling or with characteristic frequency profiles. She has become the most advanced advocate of complex adaptive systems among usage-based linguists along with Ellis and Larsen-Freeman, and yet her work still has a way to go to escape from the Scholasticism of linguistics that Walker Percy raised, and to become more like the Galileo that Percy said we needed. As with the Five Graces, Bybee continues to carry old baggage from formal linguistics.
What have we learned about usage-based linguistics and complex systems?
1. non-linear distributions
2. linguistic categories are not naturally given and discrete
Exemplar theory suggests a cognitive process by which we can build frequency profiles for speech sounds, although the process of categorization remains an issue there. Still, we need to use categories so that we can count variant expressions and observe the non-linear distributions. The idea of “constructions” is perfect for this purpose, since the use of constructions does not entail a commitment to a fixed hierarchy of categories, and it may be applied to features as small as speech sounds or as large as discourse patterns.
There is no reason that we cannot use traditional terms like “noun” or “verb” or “ditransitive” to name them, as long as we do not invoke an entire hierarchical system when we do so.
grammaticalisation as defined by Hopper (1987)
Neither does the nonlinear distribution constitute evidence that there is a particular cause for the top-ranked variant to be where it is. The complex interaction of recurrence, frequency, and setting for language use rules out any simple cause for the state of a feature or language at any given time, except to say that the process of the complex system of speech always creates nonlinear distributions.
(my emphasis)
3. scaling
If we take care to match the assessments we make to the particular populations from which our data comes, we can make better generalizations, whether for a language as a whole, for national or regional varieties, for social groups, or for particular kinds of texts. The complex interaction of recurrence, frequency, and setting for language look different from every point of view at every scale of analysis.
4. speakers as agents
When we are aware of the magnitude of the distributional problem, we certainly know that whatever experiment we conduct will never be enough to give us the kind of big-picture answers that formal linguists have generated.
↓
big typological conclusion
If we stop trying to bring along the old baggage of fixed grammars and selected usages, we are free to choose constructions to count and free to see nonlinear frequency patterns wherever they occur.
↳ constructions
Every experiment that we conduct, if it adequately describes the constructions it studies and the population of speakers who use them, makes another contribution to our knowledge of the complex system of language in use.
What we stand to gain more generally by such repeated studies, even though we know in advance that studies at different scales of analysis will not be comparable in their findings, is a new understanding of the operation of complex systems in human culture. The big picture is not a grammar, but instead a new way of understanding the humanities as the emergence of linguistic and cultural patterns out of continual human interaction. Continual movement at every level of scale is crucial to that understanding.
The most important point is that constructions are nothing more or less than patterns of usage, which may therefore become relatively abstract if these patterns include many different kinds of specific linguistic symbols. But never are they empty rules devoid of semantic content of communicative function. In usage-based approaches, countless rules, principles, parameters, constraints, features, and so forth are the formal devices of professional linguists; they simply do not exist in the minds of speakers of a natural language.
(Tomasello 2003 : 100)
(my emphasis, it’s not clear what the last two sentences mean)
Kretzschmar POV
Let us say that each of the thirty gradations on the PlanetMath curve represents one hundred different words. Then, as in Figure 4.3, the first five or six gradations at the left of the graph, about 500 or 600 words, account for about 80 percent of all the running words in the text, while the remaining 24 or 25 gradations in the long tail at the right, the other 2,400 or 2,500 different words, account for only about 20 percent of the running words.
Pareto Principle / 80/20 rule
problem with Zipf’s law
Zipf’s formula has been superseded by, among others, the mathematician Mandelbrot (who spent his career at IBM). Mandelbrot’s improved formula (1968) shows that the top-ranked words on the curve deviate from the frequency that Zipf expected, and the lower-ranked words also deviate, owing to what he called “the wealth of vocabulary.”
In Linguistic Atlas survey data (not written words in continuous discourse but spoken words and phrases gathered in the field), the top-ranked variant is often three, four, five, even ten times more frequent than the second-ranked variant, and we also see curves that are shallower than a 2:1 ratio between the first and second variant (discussed in depth in Chapter 7).
If Zipf’s Law were really a law, in the same way that thermodynamics and gravity are natural laws, then Zipf’s Law just does not work well enough.
Traditional ⟷ complex-system grammar
| traditional gramamr | complex-system grammar |
|---|---|
| grammar is a static structure of rules | grammar is open and dynamic |
| grammar consists of a hierarchical arrangement of rules | grammar consists of a very large number of interactive components/agents |
| grammar exhibits fixed relations between elements | grammar shows emergent order |
| grammar has binary distributions | grammar has non-linear frequency distributions |
| grammar has homogeneous unity | grammar has property of scaling |
neural network theory of activation patterns (Bermudez 2010: 215–245)
collocation slot fillers
What the 80/20 Rule tells us about grammatical rules, then, is that we know in advance that there will be exceptions. Indeed, exceptions are not rare events, because we can predict that the class of exceptions will account for about 20 percent of the instances of any feature we study. Moreover, the exceptions will account for about 80 percent of the different constructions possible for any feature. Once we understand that the 80/20 Rule is not a curiosity but instead the hallmark of a complex system, we can understand what we take to be grammatical regularities in a different way. Grammatical rules, it turns out, are not laws but more like guidelines
Grammatical rules are, however, more than mere suggestions because they have a nonlinear frequency curve behind them. Indeed, it is highly likely that the 80/20 distributional pattern gave rise to the idea of grammar in the first place, because speakers of any language perceive that for any question about how to put words together, there will be one or a few constructions that occur a great majority of the time, in the 80 percent group. The idea that there is a fixed objective language hierarchy, a linguistic system, originates as an observational artifact, something that we just perceive to be there because we usually do one of just a few things for any construction.
↳ “epiphenomenal” grammar (Hopper 1987)
How can our knowledge of speech as a complex system help us to improve the grammars we create?
1. prescriptive rules have no foundation
Instead of presenting rule systems that could be confused for prescriptions, the grammar would freely admit its leakage and build “exceptions” into the discussion of regularities – as the Comprehensive Grammar already does for some rules.
(this is also already the case in the ANS for Dutch)
2. distinguish between language in use and rational linguistic structure
The 80/20 Rule suggests that there is no end to the problem of trying to write rules that can generate all of the acceptable sentences of the language. It cannot happen because of the long tail. Once we accept that infrequent constructions are normal parts of language in use, then we can understand that there are too many constructions to accommodate in any elegant rule system. It will, however, continue to be possible to write rule systems that account for the 80 percent group in the 80/20 Rule, and that is in fact what has mostly been happening already.
3. scale-free networks
Grammars can only be defined for the speech of one population at a time because, while the 80/20 Rule always applies, it will apply differently for every different population of speakers. There is literally an infinite number of possible grammars, because the number of possible groupings of speakers along the geographical/social continuum is infinite.
Longman grammar shows A-curves with different entries for different registers (p. 102)
How to accommodate Hopper’s sense of continual movement, and still be able to describe the grammar of the language at any moment in time?
process of change
↓ how to deal with it?
On the other hand, in structural grammars that collect paradigmatic lists of possible constructions, there is no good linguistic reason to privilege the most common variants as having been “selected” and therefore have status as being “grammatical” and to relegate less-common variants to “noise” in the system.
“change” in the complex system of speech for historians
The increased use of clockwork in the figurative sense has not eliminated the literal sense of clockwork, but the latter has been ousted to a certain degree. Yet it still survives in the low frequency slumber of language.
↓ exteriorisation
S-curve
S-curve ⟷ A-curve
The S-curve just describes the successive frequencies of a single variant at different moments in time. In 👁 ↓ we see two different A-curves that correspond to different moments in time for the same variant, and locates the position of the variant on each curve.
(p. 115)
[T]his focus on frequency distributions, rather than qualitative change, also allows explicitly for what Laura Wright (2000: preface) has waggishly called “W curves” that describe increases and decreases in the frequency of the same form over time.
(p. 115)
(the rest of the chapter talks about English specifically and some evolutionary things)
(fluff)
neural networks
linguistic expectations
↓ so
speech in the brain
Bybee’s proposal that phonological information is stored on the basis of words (2001) must be understood in neuroscience not as the brain having some single physical location to store a word, not as a representation, but rather as the brain having a collection of interconnected neuronal pathways whose activation is related to a word.
(weird things going on here)
complexity science and language
sociolinguistics
“coexistent systems”
↳ language change in progress
Focus on systems
The heterogeneous character of the linguistic systems discussed so far is the product of combinations, alternations, or mosaics of distinct, jointly available subsystems. Each of these subsystems is conceived as a coherent, integral body of rules of the categorial, Neogrammarian type: the only additional theoretical apparatus needed is a set of rules stating the conditions for alternation.
(Weinreich, Labov, and Herzog 1968: 165)
variability in Labovian sociolinguistics
If we apply a complex systems view to this analysis, we can see that Labov has managed the subgroups in his data but has not changed what he claims to be studying. He begins with the top level of scale, New York City. He then subtracts the black speakers so that his results reflect only the white speakers, no longer all of New York City. Finally, he subtracts the upper- and lower-class whites so that he ends up with just the working-class white speakers, an even smaller part of the population of New York City. According to the scaling property of complex systems Labov is entitled to examine whichever groups of New York City speakers he wants, but he is then no longer entitled to say that he is still talking about New York City as a whole. According to complex systems it is simply an error to think that the working-class speakers in New York represent the whole city.
Style and class
The vernacular is positioned maximally distant from the idealized norm [citations of J. Milroy and S. Poplack]. Once the vernacular baseline is established, the multidimensional nature of speech behavior can be revealed . . . Thus, the unmonitored speech behavior of the vernacular enables us to tap in to the broader dimensions of the speech community. In other words, the vernacular is the foundation from which every other speech behavior can be understood.
(Tagliamonte 2006: 8)
↳ does the ‘vernacular’ exist?
evolution
‘coexistent systems’ and complex systems
skipped
A-curve
| Type A | Type B | Type C |
|---|---|---|
| single highly frequent variant | two highly frequent variants (“bump”) | three or more highly frequent variants |
| several variants with moderate frequencies | quite many variants with moderate frequencies | |
| many variants with low frequencies | ||
| long tail | ||
Type C curves tend to occur more often when the ratio of speakers per type of response is smaller, as when the number of speakers is small, or the data is subject to more finely graded differentiation of variation (as for the small phonetic differences), or there is an unusually large number of variants (as for cobbler).
We see that it is common, if not typical, for there to be a single variant that is top ranked in all the A-curves, massively more common than any other variant for the same item. However, at the same time, it is typically the case that there are many other variants, mostly the same ones in different subsamples, and they have different relative frequencies and thus different orders on the A-curve.
Our perception of categorical differences between populations of speakers, of common variants unique to particular populations, is simply not supported by the speech production evidence.
Type A
other types
As reported above, we should agree with Horvath and Horvath (2001, 2003) that it is a basic fallacy to think that the behavior of any smaller group will necessarily be the same as the behavior of the larger group of which it forms a part, or that the behavior of any group overall will predict the behavior of its sub-populations.
Kretzschmar, Kretzschmar & Brockman (2013)
↳ Lorenz Curve
The members of a population are represented on the x-axis by rank of the amount of wealth they hold, low to high in order left to right, and the percentage of the total wealth of the population is represented on the y-axis, so that a chart represents a cumulative distribution of wealth in society. In practice there is always a curve that represents the relative inequality of wealth in a population, as opposed to the hypothetical straight line of perfect equality of wealth.
↳ Gini Coefficient
When the Lorenz Curve is charted, then, the deeper the curve, the higher the Gini Coefficient, and the more unequal the wealth.
↓
Gini for phonetic data
Issues with Gini Coefficient
1. binning
If we consider the problem from the binning side, we can see that a very large number of categories, say one category per token, will not show the A-curve because it yields a linear chart with a slope of zero – a horizontal line. On the other hand, frequency data sorted into just two categories also gives us a line, with a slope that depends on the difference between the two category values. Thus the outer limits for the possible number of categories are both linear, and the nonlinear A-curve can only be observed when the number of categories into which the data is sorted lies between these two extremes.
This problem starts to make sense once you consider that speech can be expressed using formant data (F1, F2, F3) etc., and that you need to bin these values in order to compute Gini.
2. sample size
usefulness of Gini Coefficients
1. exact naming
So, for example, the classic sociolinguistic snowball (or “friend of a friend”) method of acquiring speakers in a community is likely to introduce bias since the people are known to each other, prima facie evidence that they are involved in a social network – so that we will bemeasuring speech just in the social network instead of speech in the larger community.
2. take interest in other levels of scale in which speakers participate
3. use randomised sampling
Having decided on one neighborhood, or another demographic segment, it is important to use randomized sampling to select participants. All members of the study group will also belong to other groups as well, and researchers need to avoid unintentional bias from those other connections.
sampling recommendations
4. account for non-linear frequency patterns
5. reanalyse old data with new insights
postmodernism
Another major target of Sokal and Bricmont (1998) is what they call “epistemic relativism,” the idea that “the truth or falsity of a statement is relative to an individual or to a social group” (Sokal and Bricmont 1998: 51), and this idea does come closer to postmodernism. As Sokal and Bricmont point out (Sokal and Bricmont 1998: 52):
There is no doubt that the relativist attitude is at odds with scientists’ idea of their own practice. While scientists try, as best they can, to obtain an objective view of (certain aspects of) the world [with allowance in a footnote for “nuances” of the word objective, as in doctrines like realism, conventionalism, and positivism], relativist thinkers tell them that they are wasting their time and that such an enterprise is, in principle, an illusion. We are thus dealing with a fundamental conflict.
Objective truth
But why did I do it? I confess that I’m an unabashed Old Leftist who never quite understood how deconstruction was supposed to help the working class. And I’m a stodgy old scientist who believes, naively, that there exists an external world, that there exist objective truths about that world, and that my job is to discover some of them. (If science were merely a negotiation of social conventions about what is agreed to be “true,” why would I bother devoting a huge fraction of my all-too-short life to it? I don’t aspire to be the Emily Post of quantum field theory.) Sokal and Bricmont 1998: 269)
While there are admitted problems with what might be “objective,” Sokal insists on the objectivity of science as the anchor that saves us from an endless drift of negotiation of the etiquette of truth.
Postmodernist response
Stanley Aronowitz, one of the editorial team at Social Text, does not put things in quite the same way. In a reply to the Sokal hoax article, he attacks the notion of objectivity (Aronowitz 1997):
So the issue is not whether reality exists, but whether knowledge of it is “transparent.” Herein lies Sokal’s confusion. He believes that reason, logic, and truth are entirely unproblematic. He has an abiding faith that through the rigorous application of scientific method nature will yield its unmediated truth. According to this doctrine there are “objective truths” since the earth revolves around the sun, gravity exists and various other laws of nature are settledmatters. So Sokal never interrogates the nature of evidence or facts, and simply accepts them if they have been adduced within certain algorithms that bear the stamp of “science.”
This statement is too strong, since we have seen that Sokal is willing to admit that non-scientific factors could have an influence, just not the primary influence on changes in scientific models. Still, Aronowitz does hit the mark in saying that, for Sokal, the scientific method of observation and reason are beyond attack. He continues
The point [of studying social connections of science] is not to debunk science or to “deconstruct” it in order to show it is merely a fiction. This may be the postmodern project, but it is not the project of science studies. The point is to show science as a social process, to bring it down to earth, to remove the halo from its head. Scientific truth cannot be absolute; otherwise we might agree with those who have proclaimed the “end” of science. If all knowledge, including natural science, is mediated by the social and cultural context within which it has developed, then its truths are inevitably relational to the means at hand for knowing. In fact, in much of micro-physics what is called observation is often the effects of machine technologies, a reading of effects. But the reading is theory-laden. Which means pure description based on observation is not possible. Scientists require other tools such as machines, mathematics, and infer what they see from what they believe.
This is a very strong position. To claim that the use of technology, machines, must mean that “pure description based on observation is not possible” is surely an overstatement, and to say that scientists “infer what they see from what they believe” is a reactionary conclusion that, setting aside the opening disclaimer of the paragraph, does make science into a fiction. Aronowitz ends with the postmodern project that he denied a few lines before.
(this is really just interesting to me and not really relevant anymore, so I stopped taking notes)
So, it is a postmodern view to claim, as many of us now do, that speech is essentially local, and that language variation begins in small groups at the bottom of the scale-free network of speech that rises to broad regional and social continua of speech.
It should be clear from this volume that the only way we can understand the complex system of speech is to count tokens and to assemble frequency profiles for the variants of linguistic features.
Computer simulations of language change notes
This website collects my personal notes on Computer simulations of language change. These notes are provided to bring full transparency to my research process. Of course, since they are only notes, they do not reflect my final thoughts on a topic, and should not be interpreted as such. To read finished papers, please consult my website. Do not use these notes as a basis for your own scientific research. Start from high-quality, peer-reviewed scientific literature instead.